: Stochastic Natural Language Generation 3 Natural Language Understanding Natural Language Generation Surface
نویسنده
چکیده
We describe a corpus-based approach to natural language generation (NLG). The approach has been implemented as a component of a spoken dialog system and a series of evaluations were carried out. Our system uses n-gram language models, which have been found useful in other language technology applications, in a generative mode. It is not yet clear whether the simple n-grams can adequately model human language generation in general, but we show that we can successfully apply this ubiquitous modeling technique to the task of natural language generation for spoken dialog systems. In this paper, we discuss applying corpus-based stochastic language generation at two levels: content selection and sentence planning/realization. At the content selection level, output utterances are modeled by bigrams, and the appropriate attributes are chosen using bigram statistics. In sentence planning and realization, corpus utterances are modeled by n-grams of varying length, and new utterances are generated stochastically. Through this work, we show that a simple statistical model alone can generate appropriate language for a spoken dialog system. The results describe a promising avenue for using a statistical approach in future NLG systems. A. H. Oh: Stochastic Natural Language Generation 3 Natural Language Understanding Natural Language Generation Surface Realization Semantic (Syntactic) Representation Semantic (Syntactic) Representation Surface Realization Figure 1: NLU and NLG
منابع مشابه
Towards Automatic Generation of Natural Language Generation Systems
Systems that interact with the user via natural language are in their infancy. As these systems mature and become more complex, it would be desirable for a system developer if there were an automatic method for creating natural language generation components that can produce quality output efficiently. We conduct experiments that show that this goal appears to be realizable. In particular we di...
متن کاملIntroduction to the Special Issue on Natural Language Generation
There are two sides to natural language processing. On the one hand, work in natural language understanding is concerned with the mapping from some surface representation of linguistic material expressed as speech or text--to an underlying representation of the meaning carried by that surface representation. But there is also the question of how one maps from some underlying representation of m...
متن کاملTwo-Stage Stochastic Natural Language Generation for Email Synthesis by Modeling Sender Style and Topic Structure
This paper describes a two-stage process for stochastic generation of email, in which the first stage structures the emails according to sender style and topic structure (high-level generation), and the second stage synthesizes text content based on the particulars of an email element and the goals of a given communication (surface-level realization). Synthesized emails were rated in a prelimin...
متن کاملStandardization of the generation process in a multilingual environment
Natural language generation has received less attention within the field of natural language processing than natural language understanding. One possible reason is the non-standardization of the input for generation systems. This is an obstacle to the systematic planning of the process of developing generation systems. We propose the use of UNL as a possible standard for standardizing generatio...
متن کاملEvaluation Metrics for Generation
Certain generation applications may profit from the use of stochastic methods. In developing stochastic methods, it is crucial to be able to quickly assess the relative merits of different approaches or models. In this paper, we present several types of intrinsic (system internal) metrics which we have used for baseline quantitative assessment. This quantitative assessment should then be augmen...
متن کامل